90 research outputs found
DBMSs Should Talk Back Too
Natural language user interfaces to database systems have been studied for
several decades now. They have mainly focused on parsing and interpreting
natural language queries to generate them in a formal database language. We
envision the reverse functionality, where the system would be able to take the
internal result of that translation, say in SQL form, translate it back into
natural language, and show it to the initiator of the query for verification.
Likewise, information extraction has received considerable attention in the
past ten years or so, identifying structured information in free text so that
it may then be stored appropriately and queried. Validation of the records
stored with a backward translation into text would again be very powerful.
Verification and validation of query and data input of a database system
correspond to just one example of the many important applications that would
benefit greatly from having mature techniques for translating such database
constructs into free-flowing text. The problem appears to be deceivingly
simple, as there are no ambiguities or other complications in interpreting
internal database elements, so initially a straightforward translation appears
adequate. Reality teaches us quite the opposite, however, as the resulting text
should be expressive, i.e., accurate in capturing the underlying queries or
data, and effective, i.e., allowing fast and unique interpretation of them.
Achieving both of these qualities is very difficult and raises several
technical challenges that need to be addressed. In this paper, we first expose
the reader to several situations and applications that need translation into
natural language, thereby, motivating the problem. We then outline, by example,
the research problems that need to be solved, separately for data translations
and query translations.Comment: CIDR 200
Requirement-driven creation and deployment of multidimensional and ETL designs
We present our tool for assisting designers in the error-prone and time-consuming tasks carried out at the early stages of a data warehousing project. Our tool semi-automatically produces multidimensional (MD) and ETL conceptual designs from a given set of business requirements (like SLAs) and data source descriptions. Subsequently, our tool translates both the MD and ETL conceptual designs produced into physical designs, so they can be further deployed on a DBMS and an ETL engine. In this paper, we describe the system architecture and present our demonstration proposal by means of an example.Peer ReviewedPostprint (author's final draft
Adversarial Learning in Real-World Fraud Detection: Challenges and Perspectives
Data economy relies on data-driven systems and complex machine learning
applications are fueled by them. Unfortunately, however, machine learning
models are exposed to fraudulent activities and adversarial attacks, which
threaten their security and trustworthiness. In the last decade or so, the
research interest on adversarial machine learning has grown significantly,
revealing how learning applications could be severely impacted by effective
attacks. Although early results of adversarial machine learning indicate the
huge potential of the approach to specific domains such as image processing,
still there is a gap in both the research literature and practice regarding how
to generalize adversarial techniques in other domains and applications. Fraud
detection is a critical defense mechanism for data economy, as it is for other
applications as well, which poses several challenges for machine learning. In
this work, we describe how attacks against fraud detection systems differ from
other applications of adversarial machine learning, and propose a number of
interesting directions to bridge this gap
GEM: requirement-driven generation of ETL and multidimensional conceptual designs
Technical ReportAt the early stages of a data warehouse design project, the main objective is to collect the business requirements and needs, and translate them into an appropriate conceptual, multidimensional design. Typically, this task is performed manually, through a series of interviews involving two different parties: the business analysts and technical designers. Producing an appropriate conceptual design is an errorprone task that undergoes several rounds of reconciliation and redesigning, until the business needs are satisfied. It is
of great importance for the business of an enterprise to facilitate and automate such a process. The goal of our research is to provide designers with a semi-automatic means for producing conceptual multidimensional designs and also, conceptual
representation of the extract-transform-load (ETL)processes that orchestrate the data flow from the operational sources to the data warehouse constructs. In particular, we
describe a method that combines information about the data sources along with the business requirements, for validating
and completing –if necessary– these requirements, producing a multidimensional design, and identifying the ETL operations
needed. We present our method in terms of the
TPC-DS benchmark and show its applicability and usefulness.Preprin
Synthesizing structured text from logical database subsets. EDBT
ABSTRACT In the classical database world, information access has been based on a paradigm that involves structured, schema-aware, queries and tabular answers. In the current environment, however, where information prevails in most activities of society, serving people, applications, and devices in dramatically increasing numbers, this paradigm has proved to be very limited. On the query side, much work has been done on moving towards keyword queries over structured data. In our previous work, we have touched the other side as well, and have proposed a paradigm that generates entire databases in response to keyword queries. In this paper, we continue in the same direction and propose synthesizing textual answers in response to queries of any kind over structured data. In particular, we study the transformation of a dynamically-generated logical database subset into a narrative through a customizable, extensible, and templatebased process. In doing so, we exploit the structured nature of database schemas and describe three generic translation modules for different formations in the schema, called unary, split, and join modules. We have implemented the proposed translation procedure into our own database front end and have performed several experiments evaluating the textual answers generated as several features and parameters of the system are varied. We have also conducted a set of experiments measuring the effectiveness of such answers on users. The overall results are very encouraging and indicate the promise that our approach has for several applications
Quarry : digging up the gems of your data treasury
The design lifecycle of a data warehousing (DW) system is primarily led by requirements of its end-users and the complexity of underlying data sources. The process of designing a multidimensional (MD) schema and back-end extracttransform-load (ETL) processes, is a long-term and mostly manual task. As enterprises shift to more real-time and ’on-the-fly’ decision making, business intelligence (BI) systems require automated means for efficiently adapting a physical DW design to frequent changes of business needs. To address this problem, we present Quarry, an end-to-end system for assisting users of various technical skills in managing the incremental design and deployment of MD schemata and ETL processes. Quarry automates the physical design of a DW system from high-level information requirements. Moreover, Quarry provides tools for efficiently accommodating MD schema and ETL process designs to new or changed information needs of its end-users. Finally, Quarry facilitates the deployment of the generated DW design over an extensible list of execution engines. On-site, we will use a variety of examples to show how Quarry facilitates the complexity of the DW design lifecycle.Peer ReviewedPostprint (published version
Interaction Mining: Making Business Sense of Customers Conversations through Semantic and Pragmatic Analysis
Via the Web a wealth of information for business research is ready at our fingertips. Analyzing this – unstructured - information, however, can be very difficult. Analytics has become the business buzzword distinguishing traditional competitors from ‘analytics competitors’ who have dramatically boosted their revenues. The latter competitors distinguish themselves through “expert use of statistics and modeling to improve a wide variety of functions” (Davenport, 2006, p. 105). However, not all information lends itself to statistics and models. Actually, most information on the Web is made for, and by, people communicating through ‘rich’ language. This richness of our language is typically missed or not adequately accounted for in (statistical) analytics (e.g. Text-mining) - and so is its real meaning - because it is hidden in semantics rather than form (e.g. syntax). In our efforts of turning unstructured data into structured data, important information – and our ability to distinguish ourselves from competitors - gets lost
Using semantic web technologies for exploratory OLAP: A survey
Peer ReviewedPostprint (author’s final draft
- …